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Abstract 

Most research in the theory of evolutionary computation assumes that the problem at 
hand has a fixed problem size. This assumption does not always apply to real-world opti¬ 
mization challenges, where the length of an optimal solution may be unknown a priori. 

Following up on previous work of Cathabard, Lehre, and Yao [FOGA 2011] we analyze 
variants of the (1+1) evolutionary algorithm for problems with unknown solution length. 
For their setting, in which the solution length is sampled from a geometric distribution, we 
provide mutation rates that yield an expected optimization time that is of the same order 
as that of the (1+1) EA knowing the solution length. 

We then show that almost the same run times can be achieved even if no a priori 
information on the solution length is available. 

Finally, we provide mutation rates suitable for settings in which neither the solution 
length nor the positions of the relevant bits are known. Again we obtain almost optimal run 
times for the OneMax and LeadingOnes test functions, thus solving an open problem 
from Cathabard et al. 


1 Introduction 

While the theory for evolutionary algorithms (EAs) in static problem settings is well devel¬ 
oped [1,6,9], a topic that is not so well studied in the theory of EA literature is the performance 
of EAs in uncertain environments. Uncertainty can have many faces, for example with respect 
to function evaluations, the variation operators, or the dynamics of the fitness function. Un¬ 
derstanding how evolutionary search algorithms can tackle such uncertain environments is an 
emerging research topic; see [2] for a survey on examples in combinatorial optimization, but 
also [7] for an excellent survey also discussing different sources of uncertainty. 

In this work we study what evolutionary algorithms can achieve in the presence of un¬ 
certainty with respect to the solution length. Quite surprisingly, we show that already some 
variants of the simplest evolutionary algorithm, the (1 + 1) EA, can be very efficient for such 
problems. 

1.1 Previous Work 

Our work builds on previous work of Cathabard, Lehre, and Yao [4], who were the first to con¬ 
sider, from a theoretical point of view, evolutionary algorithms in environments with unknown 
solution lengths. Cathabard et al. assume that the solution length is sampled from a fixed and 
known distribution D with finite support. More precisely, they assume that the solution length 
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n is sampled from a truncated version of the geometric distribution, in which the probability 
mass for values greater than some threshold N is shifted to the event that n = N. In this 
situation, the algorithm designer has access to both the upper bound N for the solution length 
and the success probability q of the distribution. 

Cathabard et al. analyze a variant of the (1 + 1) EA in which each bit is flipped with 
probability 1/N and they also study a variant with non-uniform bit-flip probabilities. In the 
latter, the i-th bit is flipped independently of all other bits with probability 1 /(* + 1). They 
show that these variants have polynomial expected run times on OneMax and LeadingOnes 
function, where the expectation is taken with respect to the solution length and the random 
decisions of the algorithm. An overview of the precise bounds obtained in [4] is given in Table 2. 

1.2 Our Results 

We extend the work of Cathabard et al. in several ways. In a first step (Section 3) we show 
that the regarded mutation probabilities are sub-optinral. Making use of the concentration of 
the (truncated) geometric distribution, we design bit flip probabilities that yield significantly 
smaller expected run times (for both the OneMax and the LeadingOnes function). We 
complement this finding by a lower bound that shows the optimality of our result. This proves 
that no mutation probabilities can yield a performance that is better by more than a constant 
factor than our suggested ones. 

While in the setting of Cathabard et al. we are in the convenient situation that we have full 
knowledge of the distribution D from which the solution length is sampled, one is sometimes 
faced with problems for which this knowledge is not readily available. We therefore study in 
Section 4 what can be done without any a priori knowledge about the solution length. In this 
situation we require that the algorithm designer chooses bit flip probabilities (pi)jgH such that, 
regardless of the solution length n, the expected performance of the (1 + 1) EA with bit flip 
probabilities (pi, ■.. ,p n ) is as small as possible. It is not obvious that this can be done in 
polynomial time. In fact, for both algorithms studied by Cathabard et al. as well as for any 
uniform choice of the bit flip probabilities, the expected run time on this problem is exponential 
in n (cf. Theorems 13 and 14). 

We show (Theorems 15 and 16) that not only can we tackle this problem with non-uniform 
bit flip probabilities, but, quite surprisingly, this can be even done in a way that yields almost 
optimal run times. Indeed, our results are only a log 1+£ n factor worse than the best possible 
0(nlogn) and 0(n 2 ) run time bounds for OneMax and LeadingOnes, respectively. This 
factor can be made even smaller as we shall comment at the end of Section 4.2. 

Finally, we provide in Section 4.3 a second way to deal with unknown solution lengths. 
We provide an alternative variant of the (1 + 1) EA in which the bit flip probabilities are 
chosen according to some (fixed) distribution at the beginning of each iteration. For suitably 
chosen distributions Q, the expected run times of the respective (1 + 1) EAq on OneMax and 
LeadingOnes are of the same asymptotic order as those of the previously suggested solution 
with non-uniform bit flip probabilities. In particular, they are, simultaneously for all possible 
solution lengths n, almost of the same order as the expected run time of a best possible (1+1) EA 
knowing the solution length. 

This second approach has an advantage over the non-uniform bit flip probabilities in that it 
effectively ignores bits that do not contribute anything to the fitness function (irrelevant bits). 
Thus, even if only n bits at unknown positions have an influence on the fitness function, the 
same run time bounds apply. In contrast, all previously suggested solutions require that the 
n relevant bits are the leftmost ones. This also answers a question posed by Cathabard et 
al. [4, Section 6]. 
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setting 

bit flip prob. 

OneMax 

LeadingO 

nes 

Random Length 
~ Geo (q) 

unif. and fixed 

0(q 1 log q ') 

Thm. 7 

0(q~ 2 ) 

Thm. 10 


unif. and fixed 

2^(n) 

Thm. 13 

2 fi(n) 

Thm. 14 

Adversarial 

Length 

fixed 

0(n log 2+e n) 

Cor. 17 

0(n 2 log 1+£ n) 

Cor. 17 


unif. and rand. 

0(n log 2+e n) 

Cor. 20 

0(n 2 log 1+£ n) 

Cor. 20 


Table 1: Overview of Results for 1 /N < q < 1/2 and e > 0. 
Our run time results are summarized in Tables 1 and 2. 


2 Algorithms and Problems 

In this section we define the algorithms and problems considered in this paper. For any problem 
size n, fitness function / : {0,1}"' —l M, and vector p = (p\,... ,p n ) of bit flip probabilities 
0 < Pi < 1, we consider the (1 + 1) EA^, as given by Algorithm 1. 


Algorithm 1: The (1 + 1) EA^ for p = ( p ±,... ,p n ) optimizing a pseudo-Boolean function 

/:{ 0,l} n — 


1 Initialization: Sample x € {0, l} n uniformly at random and query f{x)\ 

2 Optimization: for t = 1, 2, 3,... do 

3 for i = 1 ,..., n do 

4 With probability pi set yt +- 1 — Xi and set yi +- x, otherwise; 


5 Query f(y); 

6 if f(y) > f(x) then x +- y\ 


The (1 + 1) EAp* samples an initial search point from {0, l} n uniformly at random. It then 
proceeds in rounds, each of which consists of a mutation and a selection step. Throughout the 
whole optimization process the (1 + 1) EAp* maintains a population size of one, and the individual 
in this population is always a best-so-far solution. In the mutation step of the (1 + 1) EAp* the 
current-best solution x is mutated by flipping the bit in position i with probability pi, 1 < i < n. 
The fitness of the resulting search point y is evaluated and in the selection step the parent x is 
replaced by its offspring y if and only if the fitness of y is at least as good as the one of x. Since 
we consider maximization problems here, this is the case if f(y) > f(x). Since we are interested 
in expected run times, i.e., the expected number of rounds it takes until the (1 + 1) EA^ evaluates 
for the first time a solution of maximal fitness, we do not specify a termination criterion. It is 
not difficult to see that the (1 +1) EAp- indeed generalized the standard (1 + 1) EA. In fact, we 
obtain the (1 + 1) EA from the (1 + 1) EAp* if we set Pi = 1 jn for all i € [n] := {1,..., n}. We 
call such mutation vectors with pi = pj for all i , j uniform mutation rates, while we speak of 
non-uniform mutation rates if jy ^ pj for at least one pair (i, j). 

The two test functions we consider in this work are OneMax and LeadingOnes. For a 
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Results from [4] 

Thms. 7 (OM), 10 (LO) 

problem 

Cor. 4 

Pi = l/N 

Pi = 1 /(* +1) 

Pi = 9/2 

Pi = q 

OneMax^ 

^(1^1) 

e(iviogi) 

O(^logjv) 

e (bO 

0 (N log N) 

LeadingOnes^ 

° (?) 

9 (t) 

**(?) 

9 (?) 

9 (f) 


Table 2: Expected run times of the (1 + 1) EAp* with p = for D = TrunkGeo(iV, q) and 

1/N < q < 1/2 


given problem size n, they are defined as 

n 

Om„ := OneMax„(x) = ^2 x i , and 

i=l 

Lo n := LeadingOnes^x) 

= max{i € [0..n] | Vj < i : Xj = 1}, 

where [0..n] := {0} U [n]. That is, the OneMax function counts the number of ones in a 
bit string, while the LeadingOnes function counts the number of initial ones. While these 
two functions are certainly easy to optimize without evolutionary algorithms, the (1 + 1) EAp- 
performs exactly the same on all generalized OneMax and LeadingOnes functions, which are 
obtained from the functions above through an XOR of an arbitrary and unknown bit string 
z € {0, l } n . Understanding how an evolutionary algorithm behaves on these two functions 
is an important indicator for how it manages to cope with the easier parts of more complex 
optimization problems. OneMax and LeadingOnes functions are for this reason the two 
best-studied problems in the theory of evolutionary computation literature. 

If a distribution D is known from which the solution length is sampled we consider the 
expected run time of the (1 + 1) EA~- on OneMax^ and LeadingOnes^, respectively, which 
are the problems Om„ resp. Lo n with random solution length n ~ D. Note here that the 
expectation is thus taken both with respect to the random solution length and with respect to 
the random samples of the algorithm. 

3 Random Solution Length 

We first consider the setting that has been introduced by Cathabard, Lehre, and Yao [4], 
After a short presentation of the model in Section 3.1, a general lower bound for this problem 
(Section 3.2), and the results of [4] in Section 3.3, we show that the bounds in [4] can be 
improved by using different (uniform) mutation rates (Section 3.4). 

Table 2 summarizes the previously known bounds and our contributions for the setting 
regarded in this section. 

3.1 The Model 

Cathabard et al. [4] consider the following model. The algorithm designer knows the distribution 
D from which the unknown solution length is drawn; only distributions with finite support are 
considered, so the algorithm designer knows an upper bound N on the actual solution length n. 
He also knows the class of functions from which the optimization problem is taken (for example 
OneMax or LeadingOnes). 
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Based on this knowledge, the algorithm designer chooses a vector (pi,... ,pn) of bit flip 
probabilities indicating with which probability a bit is flipped in each round. In this work we 
also regard a slightly more general model in which the distributions over N may possibly have 
infinite support; the algorithm designer then chooses an infinite sequence of bit flip probabilities 
(pi,P 2 , ■ ■ ■) = (pi)igN- After this choice of bit flip probabilities, the actual solution length n is 
sampled from the given distribution D. Then the (1 + 1) EAp (Algorithm 1) is run with mutation 
probabilities p = (pi ,... ,p n ) on the given problem with the given problem length. 

Cathabard et al. [4] consider as distribution D the following truncated geometric distribution, 
based on a geometric distribution where the probability mass for values greater than n are moved 
to n. 

Definition 1 ([4]). The truncated geometric distribution TrunkGeo(lV, q) with truncation pa¬ 
rameter N and success probability q € (0, 1/N] satisfies, for all n € N, that the probability of 
TrunkGeo(A\ q) = n is 


q{ 1 — q )” _1 if 1 < n < N — 1, 

< (1 — g) n_1 ifn = N, 

0 otherwise. 

Note that the truncated geometric distribution recovers the geometric distribution Geo (q) 
for N = oo. 

It is well known, respectively can be found in [4, Proposition 1], that for X = Geo (q) and 
Y = TrunkGeo(AI, q) with q > 1/N 

E[X] = q~ l and E[Y] = ©(g" 1 ). (1) 

Note that we trivially have E[T] < E[A], 

3.2 A General Lower Bound 

What is a good lower bound for the expected run time of any (1 + 1) EAp- on OneMax 
or LeadingOnes when the length is sampled from some given distribution D on N? If the 
algorithm designer would know the true length n before he has to decide upon the mutation 
probabilities (p \,... ,p n ), then the optimal bit flip probability for this solution length could be 
chosen. For OneMax, the best choice is to set p = (1/n,..., 1/n) as has been proven in [10,11] 
(note here that for fixed problem sizes, due to the symmetry of OneMax, non-uniform mutation 
rates cannot be advantageous over uniform ones). This results in an expected run time of 
O(nlogn). 

For LeadingOnes, if the true length n is known, any setting of the bit-flip probabilities 
leads to an expected run time of Q(n 2 ) regardless of the choice of p, as the next lemma shows. 

Lemma 2. For any fixed solution length n and any vector p = {p \,... ,p n ) of mutation proba¬ 
bilities, the expected run time of the (1 + 1) EAp on LEADiNGONES n is fl(n 2 ). 

Proof. It is easy to see by arguments that are mostly identical to the ones in [3, Section 3.3] 
that the expected run time of the (1 + 1) EAp on LEADiNGONES n is 

n i 1 

+ 2Sn+a-p+ 

Using this bound one can easily show that we can assume without loss of generality that the 
mutation probabilities are monotonically increasing, i.e., pi < Pi+\ holds all i 6 [n]. Indeed, if 
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for some k € [re] p k > Pk+i holds, then the expected run time of the (1 + 1) EA^ is larger than 
that of the (1 + 1) EA^with q = (qi ,..., q n ), q k = p k+1 , q k+ \ = Pk, and = pt for i {k,k + 1}. 

We now regard the time it takes the (1 + 1) EAp* to produce for the first time a search point 
of fitness at least k := |_re/3j. Following [3] this takes in expectation 


E 


1 1 

2wni=i(i-Pi) 


> 


i 

E — 

^ lPk 


i=l 


Q(n/p k ). 


( 2 ) 


fitness evaluations. 

Furthermore, we have 


2k—1 


n(l-^)<(l -Pk) k <e->* k , 

j=k 


which shows that the (1 + 1) EAp- spends in expectation 


___ > e Pkk 

2 »n+‘(i -Pi) ~ 


(3) 


iterations on htness level 2k. 

Equations (2) and (3) prove that the overall expected optimization time of the (1 + 1) EAp- 
on LeadingOneSjj is £l{n/p k + exp(p*.£;/2)). For all possible choices of p k this expression is 
fl(re 2 ) as can be easily seen using a case distinction (for p k = 0(l/n) the first summand is 
fl(re 2 ), while for p k = w(l/re) the second one is growing at an exponential rate). □ 


Using these lower bounds for fixed solution lengths, Jensen’s Inequality and the convexity 
of re i—^ re log re and re eA re 2 , respectively, we get the following general lower bound. 


Theorem 3. Let D be any distribution on N with a finite expectation of m. Then the expected 
run time of any (1 + 1) EAp on OneMax^ is fl(rrelogrre) and the expected run time of any 
(1 + 1) EAp on LeadingOneSd is fl(m 2 ). Both bounds apply also to the setting in which 
the algorithm designer can choose the mutation probabilities p = (p\ ,... ,p n ) after the solution 
length re ~ D has been drawn. 

Using Equation (1), we get the following corollary. 

Corollary 4. Let N € N and q > 1/N. Let D = TrunkGeo(lV, q) or D = Geo (q). The expected 
run time of any (1 + 1) EAp on OneMaXd is fl(< 7 “ 1 logg -1 ) and the expected run time of any 
(1 + 1) EAjj on LeadingOneSd is Ll(q~ 2 ). Both bounds apply also to the setting in which 
the algorithm designer can choose the mutation probabilities p = (pi ,... ,p n ) after the solution 
length re ~ D has been drawn. 


3.3 Known Upper Bounds 

Cathabard et al. [4] analyze the run time of the (1 + 1) EAp* with uniform mutation probabilities 
pi = ... = pm = 1/N and of the (1 + 1) EA* with pi = l/(i + 1), 1 < i < N. 

For OneMax they obtain the following results. 

Theorem 5 (Results for OneMax from [4]). Let N € N, e € (0,1), and q = N~ £ . For 
D = TrunkGeo(./V, q) the expected run time of the (1 + 1) EAp with p = (1/1V,... , 1/1V) on 
OneMax/) is Q(N log q~ 1 ), while the expected run time of the (1 + 1) EA{ on OneMax^ is 
0(q~' 2 log N). 
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This result shows that the (1 +1) EAp- with p = (1 /N,... , 1/IV) outperforms the (1 +1) EAj 
for q < 1 /y/N, while the latter algorithm is preferable for larger q. As we shall see in the 
following section one should not conclude from this result that non-uniform bit flip probabilities 
are the better choice for this problem. 

Remark: By using a slightly more careful analysis than presented in [4], the bound for the 
(1 + 1) EAj on OneMaxc can be improved to 0{q~ 2 log q _1 ). In fact, an analysis similar to 
the one in Section 3.4, that is disregarding outcomes that are much larger than the expectation, 
will give that result. It can also be shown that the requirement q = N~ £ is not needed as the 
0{q~ 2 log q ” 1 ) holds for all q > 1/IV. It also holds for the (non-truncated) geometric distribution 
D = Geo (q). 

For LeadingOnes, Cathabard et al. show the following results. 

Theorem 6 (Results for LeadingOnes from [4]). For N, e, q, and D as in Theorem 5, the 
expected run time of the (1 + 1) EAp with p = (1/IV,..., 1/IV) on LeadingOnes^ is ©( Nq ~ 1 ), 
while the expected run time of the (1 + 1) EAi on LeadingOnes^ is 0(g~ 3 ). 

Thus also for LeadingOnes the (1 + 1) EAj performs better than the (1 + 1) EA^ with 
p = (1/IV,..., 1/IV) when q > 1 /\/~N while the uniform (1 + 1) EA^ should be preferred for 
smaller q. 

Remark: As in the OneMax case the @{q~ 3 ) bound for the (1 + 1) EAj holds more 
generally for all geometric distributions Geo (q) with parameter q > 1/IV. 

From Theorems 5 and 6 we can see that for both OneMaX/j and LeadingOnes^ the 
(1 + 1) EAj looses a factor of 1 /q with respect to the lower bound given by Corollary 4. This 
will be improved in the following section. 

3.4 Optimal Upper Bounds With Uniform Mutation Probabilities 

We show that for D being the (truncated or non-truncated) geometric distribution there exist 
bit flip probabilities p = (pi,... ,pn) and p = (pi)j g N> respectively, such that the expected run 
time of the (1 + 1) EA^ on OneMax^ and LeadingOnes/) is significantly lower than those of 
the two algorithms studied by Cathabard et al. The expected run times of our algorithm match 
the lower bounds given in Corollary 4 and are thus optimal in asymptotic terms. 

In both cases, i.e., both for OneMax^* and for LeadingOnes^), the mutation rates yielding 
the improvement over the results in [4] are uniform. Our results therefore imply that for these 
two problems, unlike conjectured in [4], one cannot gain more than constant factors from using 
non-uniform mutation probabilities. 

The key observation determining our choice of the mutation probability is the fact that the 
(truncated) geometric distribution is highly concentrated. Hence, if we know the parameters 
of the distribution, we can choose the mutation probability such that it is (almost) reciprocal 
in each position to the expected length of the solution. Thus, in the setting of [4], i.e., for the 
truncated geometric distribution with parameters N and q, we set pi := q/2 for all i £ [ N ] 
(recall equation (1)). Our approach naturally also works for the (non-truncated) geometric 
distribution Geo(g), which is also highly concentrated around its mean 1/q. 

We remark without proof that similar results hold for other distributions that are highly 
concentrated around the mean, e.g., binomial distributions, and also highly concentrated un¬ 
bounded distributions, such as Poisson distributions. 

Theorem 7. For N € N let 1/N < q = q(N) < 1/2. For D = Geo(q) and D = TrunkGeo(IV, q) 
the expected run time of the (1 + 1) EA^ withp = {q/2 ,... , q/2 ) on OneMax^ is ©(q” 1 logg -1 ). 

For the proof we will use the following upper bound for the expected run time of the 
(1 + 1) EA on OneMax. A similar upper bound can be found in [11, Theorem 4.1]. 
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Lemma 8 ([10, Theorem 8]). For a fixed length n and a uniform mutation vector p = (j>,...,p) 
with 0 < p < 1, the expected run time of the (1 + 1) EA^ on ONEMAX n is at most (ln(n) + 
l)/(p(l ~P) n )- 

Proof of Theorem 7. We first consider D = TrunkGeo(lV, q). We do not worry about constant 
factors in this analysis and thus bound some expressions generously. 

Using Lemma 8 we can bound the expected run time of the (1 + 1) EAp' on OneMaX/j from 
above by 


V- 1 g(l~g) n 1 (ln(n) + 1) (1 ~ q) N 1 (\n(N) + 1) 

^ q/2{l-q/2Y q/2(l - q/2)» 

To bound the last summand in this expression, we first observe that, for all positive n, 

(l-§)" = (l- g +£)" /2 >(l -q) n / 2 . (5) 

This shows that the last summand in (4) is at most 

2(l-g) 7V/2 - 1 (ln(iV) + l)/g, 

which is 0(g _1 log q~~ l ). This can be seen as follows. For q > 21nln(JV )/N it holds (us¬ 
ing the inequality 1 — q < exp(— q)) that (1 — q) N / 2 ~ 1 < exp(—qN/2) < l/ln(lV) and thus 
2(1 — q) N / 2 ~ 1 (ln(N) + 1 )/q = 0(l/q), while for 1/N < q < 2\n\n(N)/N we have (for some 
suitably chosen constant C) (1 — q) N / 2 In(IV) < (1 — 1 /N) n / 2 ln(lV) < C(ln(lV) — ln(21nlnlV)) = 
C ln(lV/(2 In In N)) < Cln(l/g). 

Using again (5) we bound the first part of the sum (4) by 

2 y- 1 (1 - q) n (ln(n) + 1) 

~i ^ (1 ~ q/2) n 

N-l 

< — Y,(Hn) + m-qT /2 

^ n =1 
N-l 

= 2 Y^ (ln(u) + 1)(1 — q) n/2 ~ l . 

n —1 

To show that this expression is 0(q~ 2 logg -1 ) we split the sum into blocks of length k := ["1 /q] 
and use again the inequality 1 — q < exp(— q). This shows that the last expression is at most 

pv/fcl-i k 

2 Y J^exp(-g(l^i-l))(ln(jA:+£) + l) 

j=o e=i 

[JV/fel-l 

<2 k Y ex P (-U 2 ! ~ 1 ))( ln 0’ + !) + ln(fc) + 1) 

3=0 

= 0(k\nk), 

where the last equality can be best seen by first consider¬ 
ing that exp(-±(^ ~~ l))(ln(fc) + 1) = ©(log A;), while 

Yj = 1 exp(—^(4^ — l))(ln(j + 1)) = 0(1). Summarizing the computations above we 
see that (4) is of order at most q~ l logg -1 . 








For D = Geo (q) the computations are almost identical. By Lemma 8 and (5) the expected 
run time of the (1 + 1) EA^ on OneMax^ is at most 

2 y, (1 - q) n (\n(n) + 1) 

1 - 9 ^ ( 1 - 9 / 2 )” 

OO 

< 2^ (1 - q) n / 2 ~\ ln(ra) + 1) = O^ 1 log C+ 1 ), 

71=1 


which can be seen in a similar way as above by splitting the sum into blocks of size k := |"1 /q\ 
and using 1 — q < exp(— q). □ 


It is interesting to note that the expected run time increases to between fl(JV) and 
0(N log N) when the mutation probability is chosen to be p = (q,...,q). This can easily 
be seen as follows. For the upper bound we use Lemma 8 (ignoring the “+1” terms which are 
easily seen to play an insignificant role) to obtain that the expected run time of the (1 + 1) EAp- 


with p 


{q,---,q) on ONEMAX TrunkGeo( 7 Vi g) is at most Y^n=i + 


(l-g )^" 1 In N 
+ 1 - 9 + 


E^Ti 1 Tr| + 0(log(N)/q) = In(( fj; 1)!) + 0(N log IV) = 0(N log N). 


We can derive a strong lower bound of Ll(N log N) in the case of 2 N / 3 < q < 1/N from the 
following one for static solution lengths. 


Lemma 9 ([10, Theorem 9], [11, Theorem 4.1]). For a fixed length n and a uniform mutation 
vector p = (p,...,p), the expected run time of the (1 + 1) EAp on OneMax™ is at least (ln(ro) — 
In Inn — 3)/(p(l — p) n ) for 2 -n / 3 < p < 1 jn and at least (ln(l/(p 2 n)) — In Inn — 3)/(p(l — p) n ) 
for l/n < p < 1/ (y/n log n). 


Thus, the expected run time of the (1 + 1) EAp* with p = (q,..., q) and 2 N / 3 < q < 1/N on 

n NP M AY , 1 nn „t x^N—1 \ n -l (ln(n)-lnlnn-3) ^ y^AT-1 1 Inn _ 1 ln((A-l)!) _ 

UNEiViAX^ runk Q eo (jY 1 ij) IS at least 2 _^ n =1 9(,t q) q(l—q) n — 2_m=1 2 1 —q 2 1-q 

Q(NlogN). Similarly we can get a lower bound of Q(N) in case of 1/N < q < l/(y/NlogN) 

by using the lower bound of 1/(<?(1 — q) n ) for any fixed solution length n. 

We now turn our attention to the LeadingOnes problems, where a similar approach as 

above yields the following result. 


Theorem 10. Let N € N and 1/N < q < 1/2. For D = TrunkGeo(A r , q) and D = Geo(g) the 
expected run time of the (1 + 1) EA^ with p = (q/2,... ,q/2) on LeadingOneSd is ®(q~ 2 ). 

We will derive this result from the following lemma, which was independently proven in 
[3, Theorem 3], [10, Corollary 2], and in a slightly weaker form in [8, Theorem 1.2]. 

Lemma 11 ([3], [10], and [8]). For a fixed length n and a mutation vector p = (p,...,p) 
with 0 < p < 1/2, the expected run time of the (1 + 1) EAp on LeadingOnes^ is exactly 
l/{2p 2 ){(l-p)~ n+l -(1-p)). 

Proof of Theorem 10. We first consider the case that the solution length is sampled from the 
truncated geometric distribution TrunkGeo(JV, q). Using Lemma 11 and (5) (in the third and 
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in the last step) the expected run time of the (1 + 1) EA^ on LeadingOneSe> is 


JV-l 


E - «) n_1 ^2 ((! - 9/ 2 ) _n+1 " (! - 9 / 2 )) + A 


n =1 


< 


2 / /I n\ n ~ 1 


(l-9) r 


-V , 

9 ^ V ( 1 - 9 / 2 )' 


— )+ A 


< - V (1 - g) n / 2 + A 
Q 


n =0 

2 1 


+ A = 0(g 2 ) + A, 


g 1 — (1 — g) 1 / 2 

where A is the summand that accounts for the event that the solution length is N, i.e., 

a = (i -ft 1 - ir N+1 - a - ?» = o (,- 2 ). 

Similarly for D = Geo(g) the expected run time of the (1 + 1) EAp- on LeadingOneSd is 
bounded from above by 


2 y (1 — q) n 1 2 y /2 

9 n=i (1 9 / 2) n—1 q ^ 




q 1 — (1 — g) 1 / 2 q 2 

where we recall that the last step follows from (5) for n = 1, which provides (1 — g) 1 ^ 2 < 
1 - g/2. □ 


Just as for OneMax^ (with D = TrunkGeo(iV, g)) we see that also on 
LeadingOneSd the expected run time increases (in this case to 0(iV/g)) when the 
mutation probability is chosen to be p = (g,..., (?). By Lemma 11 this run time 

equals Ejfi 1 ,(1 ~f~'w ((1 - «)-” +1 - (1 - 9)) + A = i ES (1 - (1 - «)”) + A = 

2 ^- ^A^ — 1 — 1 -|-l) + A = ©(IV/g) + A, where A is the summand that accounts for 

the event that the solution length is N, i.e., 

a = a - ((i - ir N+1 - (i -«)) = e (<r 2 ). 


4 Arbitrary Solution Lengths 

In the setting described in Section 3 it is assumed that the algorithm designer has quite a 
good knowledge about the solution length. Not only does he know an upper bound N on the 
solution length, but he may also crucially exploit its distribution. Indeed, we make quite heavy 
use in Theorems 7 and 10 of the fact that the (truncated) geometric distribution is highly 
concentrated around its expected value. That so much information is available to the algorithm 
designer can be a questionable assumption in certain applications. We therefore regard in this 
section a more general setting in which no a priori information is given about the possible 
solution length n. That is, we regard a setting in which the solution length can be an arbitrary 
positive integer. In this setting neither do we have any upper bounds on n nor any information 
about its distribution. 
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As before, our task is to decide upon on a sequence of mutation probabilities 0 < 

Pi < 1. An adversary may then choose the solution length n and we run the (1 + 1) EA^ with 
p = {pi,... ,p n )• In practical applications, this can be implemented with a (possibly generous) 
upper bound on the problem size. 

We first show that uniform fixed bit flip probabilities necessarily lead to exponential run 
times (see Section 4.1). We then show two ways out of this problem. In Section 4.2 we consider 
non-uniform bit flip probabilities and in Section 4.3 we show that we can have an efficient 
algorithm with uniform bit flip probabilities if we choose the bit flip probability randomly in 
each iteration. 

4.1 Uniform Bit Flip Probabilities 

It seems quite intuitive that if nothing is known about the solution length there is not much 
we can achieve with uniform bit flip probabilities. In fact, for any fixed mutation probability 
p G [0,1], we just need to choose a large enough solution length n to see that the (1 + 1) EA^ 
with uniform mutation probability p is very inefficient. 

More precisely, using the following statement (which is a simplified version of [11, Theorem 
6.5]) we get the lower bound regarding optimizing OneMax with uniform bit flip probabilities 
stated in Theorem 13. 

Theorem 12 (from [11]). Let 0 < e < 1 be a constant. On any linear function, the expected 
optimization time of the (1 + 1) EAp with p = {p,... ,p) and p = 0(n _2 / 3 ~ e ) is bounded from 
below by 

(1 ” ° (1)) W=Tr min { ln( " ) ’ 111 (+?)} ’ 

Theorem 13. Let p G [0,1] be a constant. Then there exists a positive integer no G N such 
that for all n > no the expected run time of the (1 + 1) EAp with p = (p,... ,p) on ONEMAX n 
is . 


It is quite intuitive that for large p the expected optimization time of the (1 + 1) EA~* with 
p = (p,...,p) is very large also for small problem sizes, as in this case typically too many bits 
are flipped in each iteration. This has been made precise by Witt, who showed that for p, n 
with p = fl(n £_1 ), the expected run time of the (1 + 1) EA^ is 2^ nE - ) with probability at least 
1 — 2 - U n + [n j Theorem 6.3]. 

For LeadingOnes we get a similar lower bound from Lemma 11. 

Theorem 14. Letp G (0,1/2). Then the expected run time of the (1 + 1) EAp withp = (p,... ,p) 
on LEADINGONES n IS 2 Q ( n ). 

Proof. From Lemma 11 we have that the expected run time of the (1 + 1) EA^ is, for n large 
enough, 

_L ((l -p)-"+i - (1 ~P))>\ (e pn - p - 1) = 

□ 


4.2 Non-Uniform Bit Flip Probabilities 

One way to achieve efficient optimization with unknown solution length is by using non-uniform 
mutation rates, that is, different bit positions have different probabilities associated for being 
flipped during a mutation operation. 
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To state our results we need the concept of summable sequences. Such sequences will be 
the basis for the sequence of bit flip probabilities. A brief discussion of summable sequences 
can be found in Section A in the appendix. In short, a sequence (pi)i£n is summable if its 
series (X)fc=iPfc)neN converges (that is, if it is bounded). The advantage of using summable 
sequences is that the probability of flipping only one single bit is always constant, regardless 
of the total number of bits considered, i.e., regardless of the problem length n. This is in 
contrast to the sequence (1 /(* + l))jgN considered in [4], which is not summable, and which 
has a chance of (1/2) J/[/ =2 (l — l/(* + 1)) = 1/n of flipping only the first bit and a chance of 
(1/n) nr=i(l — l/(* + 1)) = 1/n 2 of flipping only the nth bit. For this reason the (1 + 1) EAj 
is very inefficient for the setting in which the solution length can be arbitrary. 

Theorems 15 and 16 show that not knowing the solution length n does not harm the run time 
more than by a factor of order log 1+£ n with respect to the optimal bound when the problem 
length is known a priori, cf. also Corollary 17 for an explicit sequence yielding this bound. In 
fact, we prove that the additional cost caused by not knowing the solution length in advance is 
even a bit smaller, cf. the comments after Corollary 17. 

We start with the theorem regarding OneMax. 

Theorem 15. Let ( Pi)i gpj be a monotonically decreasing summable sequence with £ := 
]>///, pi < 1. Then, for any n € N, the expected run time of the (1 + 1) EAp with p = (p\,... ,p n ) 
on ONEMAX n is at most logn/(p n (l — £)) = O(log n/p n ). 

Proof. We make use of the multiplicative drift theorem [5, Theorem 3] and show that for every 
n and every search point x with n — k ones, the probability to create in one iteration of the 
(1 + 1) EA^ with p = (pi, ... ,p n ) a search point y with ONEMAX n (y) > OneMax„(x) is at 
least of order k/p n . This can in fact be seen quite easily by observing that the probability to 
increase the OneMax- value of x by exactly one is at least 

n n oo 

k Pn IIU- Pj) ^ k Pn ( 1 - ^Pj) > k Pn( 1 ~ ^Pj) 
j =i 1= 1 l=i 

= kp n ( 1 - £). 

From this an upper bound of logn/(p n (l — £)) for the run time of the (1 + 1) EAp* follows 
immediately from the multiplicative drift theorem. □ 

Next we consider LeadingOnes. The proof follows along similar lines as the one for One- 
Max and uses a fitness level argument instead of multiplicative drift (using additive drift would 
also be possible). 

Theorem 16. Let (pi)jgN be a monotonically decreasing summable sequence with £ := 
YliZiPi < 1- Then, for any n € N, the expected run time of the (1 + 1) EA^ withp = (p\,... ,p n ) 
on LeadingOnes„, is at most n/(p n ( 1 — £)) = 0(n/p n ). 

Proof. Let n, k G N with k < n and let x € {0, l} n with Lo(x) = k — 1. The probability to get 
in one iteration of the (1 + 1) EAp- with p = (pi,... ,p n ) a search point y with Lo(y) > Lo(x) 
is at least 

k—1 k —1 

Pk n(! - Pi) ^ Pki 1 - Y.] p i) - P fc ( 1 ~ S ) - P «( 1 “ S )- 
1=1 1=1 

By a simple fitness level argument (see, e.g., the work by Sudholt [10] for background and 
examples of this method), the expected run time of the (1 + 1) EAp' on LEADlNGONES n is thus 
at most n/(p n ( 1 — £)). □ 
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It is well known that for every constant e > 0 the sequence (l/(* log 1+£ is summable 

(this can be proven via Cauchy’s condensation test). It is obviously also monotonically decreas¬ 
ing in i. Theorems 15 and 16, together with the sequence (y*)ieN := (1/(25* log 1+£ i))*eN for 
S := l/(ilog 1+£ *), therefore imply the following corollary. 

Corollary 17. For every positive constant e there exists a sequence of mutation probabilities 
(Pi)ie N such that for any n the expected run time of the (1 + 1) EAp with p = (pi,... ,p n ) on 
OneMax„ is 0(n log 2+£ n) and is of order n 2 * * * 6 7 log 1+£ n for LeadingOnes^. 

The bound from Corollary 17 can be improved by regarding the following summable se¬ 
quences. 

For any r € M and any * € N >2 let 


logW r := 


fog 2 (fog ( * ^r), 

1 , 


if logi*- 1 )^) > 2; 
otherwise; 


where log^ r := log 2 r if r > 2 and logi 1 ^ r := 1 otherwise. For every constant e > 0 and all 
positive integers s, i let 


S — 1 


pt' e '■= v i(iog (s) (i)) i+£ n iog(j) w 


3 = 1 


( 6 ) 


For every e > 0 and every s > 1 the sequence (p*’ £ )«e n is summable. Furthermore, this sequence 
clearly is monotonically decreasing. Choosing larger and larger s therefore gives better and 
better asymptotic run time bounds in Theorems 15 and 16. 


4.3 Randomized Bit Flip Probability 

In the conclusions of [4] the authors ask the following: how can we optimize efficiently when 
an upper bound N on the problem length is known, but only n bits at unknown positions 
are relevant for the fitness? It is not difficult to see that our previous solutions with non- 
uniform bit flip probabilities will not be able to assign appropriate bit flip probabilities to 
the relevant bit positions. However, any uniform choice of bit flip probabilities will effectively 
ignore irrelevant bit positions. In this section we consider a variation of the (1 + 1) EA where 
the bit flip probability p is chosen randomly from a distribution Q on (0,1) in each iteration 
(the distribution Q does not change over time). This mutation probability is then applied 
independently to each bit, i.e., each bit of the current best solution is independently flipped 
with probability p. See Algorithm 2 for the detailed description of the (1 + 1) EAq. 


Algorithm 2: The (1 + 1) EAq for a distribution Q on (0,1) optimizing a pseudo-Boolean 
function / : {0, l} n —>• R. 


1 

2 

3 

4 

5 


Initialization: Sample x € {0, l} n uniformly at random and query /(x); 

Optimization: for t = 1, 2, 3,... do 

Sample bit flip probability pt from Q\ 

for i = 1,..., n do 

With probability pt set y* •(— 1 — X{ and set y* 4— Xi otherwise; 


6 Query /(y); 

7 if /(y) > f(x) then x <r- y; 
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To make the problem more explicit, we are asked to find a distribution Q on [0,1] such that 
the (1 + 1) EAq efficiently optimizes for any n € N and any pairwise different b\,... ,b n € N 
the functions 

n 

OneMaXj,^ (x) := E , respectively 

i =1 

LeadingOnes^ ...^(x) := max{f € [0..n] | Vj < i : Xf )j = 1}. 

In Theorems 18 and 19 we show that such a distribution Q exist. That is, there is a distribution 
Q such that the corresponding (1 + 1) EAq efficiently optimizes any OneMax^...^ and any 
LEADlNGONESb li ... i ;, n function, regardless of the number of relevant bits and regardless of their 
positions. 

We start with our main result regarding OneMax. 

Theorem 18. Let (pi)ieN € (0,1) N be a monotonically decreasing summable sequence. Set 
E := Yl'jLiPj- Let Q be the distribution which assigns the mutation probability l/i a probability 
ofPi/'Z. 

For any n € N and any pairwise different positive integers b \,..., b n the expected run time 
of the (1 + 1) EAq on ONEMAX 6li ... jhn is O (log (n)/p 2n )- 

Proof. The probability to sample a mutation probability between l/(2n) and 1/n is 

2 n 

y jPj > np 2n - 

j=n 

We disregard all iterations in which we do not sample a mutation probability between l/(2n) 
and n (they can only be beneficial). Thus, on average, we consider at least one iteration out of 
l/(np 2 n)- 

Assuming that x is a search point with n — i ones (in the relevant positions) and that the 
sampled bit flip probability p satisfies l/(2n) < p < 1/n, the probability to make a progress of 
exactly one is at least 

lp( 1 — p) n ~ l > l/(2n)(l — l/n) n_1 > £/(2en). 

Thus, we have an expected progress in each iteration of at least 

l 

- - np 2 n = o 0 ip 2 n ) ■ 

Zen 

Therefore, by the multiplicative drift theorem [5, Theorem 3], we need in expectation 
0(\og{n)/p2n) iterations to optimize function OneMax^ i ^ n . □ 

For LeadingOnes we obtain the following. 

Theorem 19. Let (pi)jgN an d Q as in Theorem 18. 

For any n € N and any pairwise different bi,...,b n € N the expected run time of the 
(1 + 1) EAq on LeadingOnes^...^ is O (n/p 2 n )- 

Proof. This proof follows along similar lines as the one for OneMax. We have again that the 
probability to have a bit flip probability between l/(2n) and 1/n in an iteration is at least np 2 n - 
Let x be a search point with LeadingOnes^...^( x) = l. Given a mutation probability p 
between l/(2n) and 1/n, the probability to create in one iteration of the (1 + 1) EAq a search 
point y of fitness greater than £ is at least 

p( 1 — p) £_1 > l/(2n)(l — l/n) n_1 > 1/(2en). 
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Thus, we have an expected progress in each iteration of at least 


-- np 2n = 0(p 2n )- 

Zen 

Therefore, by the fitness level method (see again [10] for a discussion of this method), we need 
in expectation 0(n/p2 n ) iterations to optimize LeadingOnes^...^. □ 

By choosing the summable sequence with entries as in (6) and s = 1, the two theorems 
above immediately yield the following result. 

Corollary 20. The expected run time of the described (1 + 1) EAq with Q using the summable 
sequence (6) with s = 1 on OneMax^. ..^ is O (nlog 2+e n) and on LeadingOnes^...^ it is 
0{n 2 log 1+£ n). 

Note that, just as discussed after Corollary 17, choosing larger and larger s gives asymptot¬ 
ically better and better bounds. 

5 Summary and Outlook 

We have analyzed the performance of variants of the (1 + 1) EA in the presence of unknown 
solution lengths. While for highly concentrated solution length non-uniform mutation prob¬ 
abilities are not advantageous (or at least not to a significant degree), they are crucial in a 
setting in which we do not have any knowledge about the solution length. Surprisingly, even 
in the latter situation, a sequence of (non-uniform) mutation probabilities exists such that the 
corresponding (1 + 1) EA is almost optimal, simultaneously for all possible solution lengths. 

We have also investigated a setting in which the relevant bit positions can be arbitrary in 
number and position. Possibly even more surprisingly, even this can be handled quite efficiently 
by a (1 + 1) EA variant for the two test functions OneMax and LeadingOnes. 

We believe the setting of unknown solution length to be relevant for numerous real-world 
applications. As a next step toward a better understanding of how this uncertainty can be 
tackled efficiently with evolutionary algorithms, we suggest to investigate more challenging 
function classes, e.g., starting with the class of all linear functions. It is not clear a priori if 
bounds similar to the ones presented in Section 4 can be achieved for such problems. 

From a mathematical point of view it would also interesting to investigate the tightness of 
our bounds in Section 4. We do not know whether some choice of mutation probabilities gives 
an upper bound of 0(n log n) for OneMax or 0{n 2 ) for LeadingOnes. We recall that the 
sequences (l/(nlog(n))) ngN as well as (l/p°°’ £ )j e N with p°°' £ := lim s _ Kx) ]? £ ’ £ are not summable. 
Removing the gap entirely is therefore likely to require a substantially different approach. 
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A Summable Sequences 

For a sequence p = (pi)i£N the k-th term of its associated series is the partial sum Hkip) = 
Yli=iPi- The sequence p is said to be summable if its associated series converges, i.e., if 
linifc^oo Xfc exists. For p £ R>o this is the case if and only if the sequence (Y<k)k£N (note 
that the series forms a sequence itself) is bounded. The limit lim^oo is often abbreviated 
by pi, a notation that we adopt here as well. 

It is well known that the sequence (l/n 2 ) ne pj is summable. Similarly, for all e > 0 the 
sequence (l/n 1+£ ) rie N is summable, while the harmonic sequence (l/ro) ne N is not. Note that 
the latter is the sequence of non-uniform bit flip probabilities used in the work of Cathabard et 
al. [4], 

For our purposes in Section 4 we need summable sequences that are as large as possible (with 
respect to O-notation). As the examples above show, these sequences have to be in between 
(l/n 1+e ) ne N and (l/n) ng N- The sequences defined after Corollary 17 are already pretty large. 
Note that for s —>• oo these sequences converge to the sequence with entries 

l O g0)( n ) | 


Pn 


■= i/un 


3 = 1 
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This sequence is well-defined (since, for each n, almost all terms in the product are 1), but it 
is not summable. For the sake of completeness we note that there are summable sequences 
which are larger than any sequence ( pn £ ) n £ n> but a further discussion is beyond the scope of 
this paper. 
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